16 research outputs found

    Constructing cluster of simple FPGA boards for cryptologic computations

    Get PDF
    In this paper, we propose an FPGA cluster infrastructure, which can be utilized in implementing cryptanalytic attacks and accelerating cryptographic operations. The cluster can be formed using simple and inexpensive, off-the-shelf FPGA boards featuring an FPGA device, local storage, CPLD, and network connection. Forming the cluster is simple and no effort for the hardware development is needed except for the hardware design for the actual computation. Using a softcore processor on FPGA, we are able to configure FPGA devices dynamically and change their configuration on the fly from a remote computer. The softcore on FPGA can execute relatively complicated programs for mundane tasks unworthy of FPGA resources. Finally, we propose and implement a fast and efficient dynamic configuration switch technique that is shown to be useful especially in cryptanalytic applications. Our infrastructure provides a cost-effective alternative for formerly proposed cryptanalytic engines based on FPGA devices

    Flattening NTRU for Evaluation Key Free Homomorphic Encryption

    Get PDF
    We propose a new FHE scheme {\sf F-NTRU} that adopts the flattening technique proposed in GSW to derive an NTRU based scheme that (similar to GSW) does not require evaluation keys or key switching. Our scheme eliminates the decision small polynomial ratio (DSPR) assumption but relies only on the standard R-LWE assumption. It uses wide key distributions, and hence is immune to the Subfield Lattice Attack. In practice, our scheme achieves competitive timings compared to the existing schemes. We are able to compute a homomorphic multiplication in 24.424.4~msec and 34.334.3~msec for 55 and 3030 levels, respectively, without amortization. Furthermore, our scheme features small ciphertexts, e.g. 11521152~KB for 3030 levels, and eliminates the need for storing and managing costly evaluation keys. In addition, we present a slightly modified version of F-NTRU that is capable to support integer operations with a very large message space along with noise analysis for all cases. The assurance gained by using wide key distributions along with the message space flexibility of the scheme, i.e. bits, binary polynomials, and integers with a large message space, allows the use of the proposed scheme in a wide array of applications

    Constructing cluster of simple FPGA boards for cryptologic computations

    Get PDF
    In this thesis, we propose an FPGA cluster infrastructure which can be utilized in implementing cryptanalytic attacks and accelerating cryptographic operations. The cluster can be formed using simple and inexpensive, off-the-shelf FPGA boards featuring an FPGA device, local storage, CPLD, and network connection. Forming the cluster is simple and no effort for the hardware development is needed except for the hardware design for the actual computation. Using a softcore processor on FPGA, we are able to configure FPGA devices dynamically and change their configuration on the fly from a remote computer. The softcore on FPGA can execute relatively complicated programs for mundane tasks unworthy of FPGA resources. Finally, we propose and implement a fast and efficient dynamic configuration switch technique that is shown to be useful especially in cryptanalytic applications. Our infrastructure provides a cost-effective alternative for formerly proposed cryptanalytic engines based on FPGA devices

    Mayhem: Targeted Corruption of Register and Stack Variables

    Full text link
    In the past decade, many vulnerabilities were discovered in microarchitectures which yielded attack vectors and motivated the study of countermeasures. Further, architectural and physical imperfections in DRAMs led to the discovery of Rowhammer attacks which give an adversary power to introduce bit flips in a victim's memory space. Numerous studies analyzed Rowhammer and proposed techniques to prevent it altogether or to mitigate its effects. In this work, we push the boundary and show how Rowhammer can be further exploited to inject faults into stack variables and even register values in a victim's process. We achieve this by targeting the register value that is stored in the process's stack, which subsequently is flushed out into the memory, where it becomes vulnerable to Rowhammer. When the faulty value is restored into the register, it will end up used in subsequent iterations. The register value can be stored in the stack via latent function calls in the source or by actively triggering signal handlers. We demonstrate the power of the findings by applying the techniques to bypass SUDO and SSH authentication. We further outline how MySQL and other cryptographic libraries can be targeted with the new attack vector. There are a number of challenges this work overcomes with extensive experimentation before coming together to yield an end-to-end attack on an OpenSSL digital signature: achieving co-location with stack and register variables, with synchronization provided via a blocking window. We show that stack and registers are no longer safe from the Rowhammer attack

    Accelerating NTRU based Homomorphic Encryption using GPUs

    Get PDF
    In this work we introduce a large polynomial arithmetic library optimized for Nvidia GPUs to support fully homomorphic encryption schemes. To realize the large polynomial arithmetic library we convert the polynomial with large coefficients using the Chinese Remainder Theorem into many polynomials with small coefficients, and then carry out modular multiplications in the residue space using a custom developed discrete Fourier transform library. We further extend the library to support the homomorphic evaluation operations, i.e. addition, multiplication, and relinearization, in an NTRU based somewhat homomorphic encryption library. Finally, we put the library to use to evaluate homomorphic evaluation of two block ciphers: Prince and AES, which show 2.57 times and 7.6 times speedup, respectively, over an Intel Xeon software implementation

    On-the-fly Homomorphic Batching/Unbatching

    Get PDF
    We introduce a homomorphic batching technique that can be used to pack multiple ciphertext messages into one ciphertext for parallel processing. One is able to use the method to batch or unbatch messages homomorphically to further improve the flexibility of encrypted domain evaluations. In particular, we show various approaches to implement Number Theoretic Transform (NTT) homomorphically in FFT speed. Also, we present the limitations that we encounter in application of these methods. We implement homomorphic batching in various settings and present concrete performance figures. Finally, we present an implementation of a homomorphic NTT method which we process each element in an independent ciphertext. The advantage of this method is we are able to batch independent homomorphic NTT evaluations and achieve better amortized time

    A custom accelerator for homomorphic encryption applications

    Get PDF
    After the introduction of first fully homomorphic encryption scheme in 2009, numerous research work has been published aiming at making fully homomorphic encryption practical for daily use. The first fully functional scheme and a few others that have been introduced has been proven difficult to be utilized in practical applications, due to efficiency reasons. Here, we propose a custom hardware accelerator, which is optimized for a class of reconfigurable logic, for Lopez-Alt, Tromer and Vaikuntanathan’s somewhat homomorphic encryption based schemes. Our design is working as a co-processor which enables the operating system to offload the most compute–heavy operations to this specialized hardware. The core of our design is an efficient hardware implementation of a polynomial multiplier as it is the most compute–heavy operation of our target scheme. The presented architecture can compute the product of very–large polynomials in under 6.25 ms which is 102 times faster than its software implementation. In case of accelerating homomorphic applications; we estimate the per block homomorphic AES as 442 ms which is 28.5 and 17 times faster than the CPU and GPU implementations, respectively. In evaluation of Prince block cipher homomorphically, we estimate the performance as 52 ms which is 66 times faster than the CPU implementation

    MMSAT: A Scheme for Multimessage Multiuser Signature Aggregation

    Get PDF
    Post-Quantum (PQ) signature schemes are known for large key and signature sizes, which may inhibit their deployment in real world applications. In this work, we construct a PQ signature scheme MMSAT that is the first such scheme capable of aggregating unrelated messages signed individually by different parties. Our proposal extends the notion of multisignatures, which are signatures that support aggregation of signatures on a single message signed by multiple parties. Multisignatures are especially useful in blockchain applications, where a transaction may be signed by multiple users. The proposed construction achieves significant gains in bandwidth and storage requirements by allowing aggregation of unrelated transactions. Our construction is derived by extending the PASS scheme, and thus the security of our scheme relies on the hardness of the Vandermonde-SIS problem. When aggregated, a signature consists of two parts. The first part is a post-quantum size signature that grows very slowly, scaling by on the order of~logK\log{K} bits for~KK signatures. The second part scales linearly with~KK, with a very short fixed cost, roughly twice the bit security. Thus even when aggregating a modest number of signatures, the per signature cost of MMSAT is in line with that of traditional pre-quantum signature schemes such as ECDSA. As an extension to MMSAT, we describe a variant called MMSATK in which it the public keys required to verify an aggregated signature are compressed by a factor of~2020 to~3030

    HOMOMORPHIC AUTOCOMPLETE

    Get PDF
    With the rapid progress in fully homomorpic encryption (FHE) and somewhat homomorphic encryption (SHE) schemes, we are wit- nessing renewed efforts to revisit privacy preserving protocols. Several works have already appeared in the literature that provide solutions to these problems by employing FHE or SHE techniques. These applications range from cloud computing to computation over confidential patient data to several machine learning problems such as classifying privatized data. One application where privacy is a major concern is web search – a task carried out on a daily basis by billions of users around the world. In this work, we focus on a more surmountable yet essential version of the search problem, i.e. autocomplete. By utilizing a SHE scheme we propose concrete solutions to a homomorphic autocomplete problem. To investigate the real-life viability, we tackle a number of problems in the way towards a practical implementation such as communication and computational efficiency

    Accelerating LTV based homomorphic encryption in reconfigurable hardware

    Get PDF
    After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The accelerator we present is connected via a fast PCIe interface to a CPU platform to provide homomorphic evaluation services to any application that needs to support blinded computations. Specifically we introduce a number theoretical transform based multiplier architecture capable of efficiently handling very large polynomials. When synthesized for the Xilinx Virtex 7 family the presented architecture can compute the product of large polynomials in under 6.25 msec making it the fastest multiplier design of its kind currently available in the literature and is more than 102 times faster than a software implementation. Using this multiplier we can compute a relinearization operation in 526 msec. When used as an accelerator, for instance, to evaluate the AES block cipher, we estimate a per block homomorphic evaluation performance of 442 msec yielding performance gains of 28.5 and 17 times over similar CPU and GPU implementations, respectively
    corecore